Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points
نویسندگان
چکیده
This paper proposes a new clustering algorithm in the fuzzy-c-means family, which is designed to cluster time series and is particularly suited for short time series and those with unevenly spaced sampling points. Short time series, which do not allow a conventional statistical model, and unevenly sampled time series appear in many practical situations. The algorithm developed here is motivated by experiments in biology. Conventional clustering algorithms based on the Euclidean distance or the Pearson correlation coefficient, such as hard k-means or hierarchical clustering are not able to include the temporal information in the distance measurement. Uneven sampling commonly occurs in biological experiments. The temporal order of the data is important and the varying length of sampling intervals should be considered in clustering time series. The proposed short time series (STS) distance is able to measure similarity of shapes which are formed by the relative change of amplitude and the corresponding temporal information. We develop a fuzzy time series (FSTS) clustering algorithm by incorporating the STS distance into the standard fuzzy clustering scheme. An example is provided to illustrate the performance of the proposed FSTS clustering algorithm in comparison with fuzzy c-means, k-means and single linkage hierarchical clustering.
منابع مشابه
Clustering of unevenly sampled gene expression time-series data
Time course measurements are becoming a common type of experiment in the use of microrarrays. The temporal order of the data and the varying length of sampling intervals are important and should be considered in clustering time-series. However, the shortness of gene expression time-series data limits the use of conventional statistical models and techniques for time-series analysis. To address ...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملA Fuzzy Approach for Clustering Gene Expression Time Series Data
Identifying groups of genes that manifest similar expression patterns is crucial in the analysis of gene expression time series data. Choosing a similarity measure to determine the similarity or distance between profiles is an important task. Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8...
متن کاملCombination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting
In order to provide an efficient conversion and utilization of solar power, solar radiation datashould be measured continuously and accurately over the long-term period. However, the measurement ofsolar radiation is not available to all countries in the world due to some technical and fiscal limitations. Hence,several studies were proposed in the literature to find mathematical and physical mod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003